Skip to content

fix(CS-11647): hide manual approval UI for auto-executed commands#5273

Open
FadhlanR wants to merge 8 commits into
mainfrom
cs-11647-accept-all-flash-before-check-correctness
Open

fix(CS-11647): hide manual approval UI for auto-executed commands#5273
FadhlanR wants to merge 8 commits into
mainfrom
cs-11647-accept-all-flash-before-check-correctness

Conversation

@FadhlanR

@FadhlanR FadhlanR commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Background and Goal

CS-11647: when the AI assistant sent a tool call the host was going to auto-execute, the manual approval UI flashed for ~100ms before auto-execution kicked in. Two surfaces show this UI and both have the same race:

  • The Accept All / Cancel bar at the bottom of the room.
  • The per-command Apply button (the Run / actionVerb button) rendered next to each tool-call message.

The bug isn't checkCorrectness-specific — it applies to every auto-execute branch: checkCorrectness, requiresApproval === false, or LLM mode 'act'. All three flow through the same debounced decision in command-service.drainCommandProcessingQueue.

Where to start

  • packages/host/app/lib/command-auto-execute.ts (new) — isAutoExecutableCommand(command, activeLLMMode, isOwnedByCurrentAgent), a pure predicate that returns true only when the host will actually auto-execute. Also re-homes CHECK_CORRECTNESS_COMMAND_NAME.
  • packages/host/app/services/command-service.tsdrainCommandProcessingQueue calls the helper instead of inlining the three-condition if-chain.
  • packages/host/app/components/matrix/room.gtsreadyCommands filters out auto-executable commands so the Accept All bar can't paint in its manual-approval branch.
  • packages/host/app/components/matrix/room-message-command.gtsapplyButtonState masks the per-command button into the applying spinner immediately for auto-executable commands, so the Run button never flashes.

Key decisions and non-obvious mechanics

  • One source of truth. The action bar, the per-command button, and command-service's drain loop all call the same predicate. They can't drift and reintroduce the flash.
  • Pure function, not a service method. The helper takes only what it needs (command, LLM mode, agent-ownership boolean) so callers compose it locally and it's trivially unit-testable.
  • Agent-ownership matters. Beyond the three auto-exec conditions, command-service refuses to auto-run a message that came from a different agent (message.agentId !== matrixService.agentId). The predicate has to know this — otherwise a requiresApproval=false command from another agent would mask into the spinner forever (caught the first time by an acceptance test, fixed in the latest commit).
  • validate() failures are safe. If command-service's drain decides not to run a command (validation failed), it dispatches an invalid commandResult. The per-command button transitions to its invalid state via the result event — no risk of the spinner sticking.
  • User-perceived flow after the fix (for any auto-executed command): streaming → Working… (existing preparing state) → spinner immediately → result. No Run button, no Accept All / Cancel bar.

Tests

  • Unit (tests/unit/lib/command-auto-execute-test.ts) — covers all three auto-exec branches plus the agent-ownership short-circuit.
  • Integration (tests/integration/components/ai-assistant-panel/commands-test.gts):
    • checkCorrectness (always-auto-execute branch) → Accept All bar suppressed; asserted immediately and after settled().
    • requiresApproval=false via the boxel-environment skill → Accept All bar suppressed.
    • patchCardInstance (requires approval, control) → Accept All bar still renders.
    • checkCorrectness → per-command button shows applying, not ready.

The LLM-mode-'act' branch is covered by the unit test; doing it end-to-end would need extra mode-state plumbing for diminishing returns since all three branches converge on the same predicate.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Preview deployments

Host Test Results

    1 files      1 suites   1h 58m 43s ⏱️
3 123 tests 3 107 ✅ 15 💤 0 ❌ 1 🔥
3 142 runs  3 125 ✅ 15 💤 1 ❌ 1 🔥

Results for commit 1bd7797.

For more details on these errors, see this check.

Realm Server Test Results

    1 files  ±    0      1 suites  ±0   12m 2s ⏱️ -3s
1 735 tests ±    0  1 735 ✅ ±    0  0 💤 ±0  0 ❌ ±0 
3 461 runs  +1 633  3 461 ✅ +1 633  0 💤 ±0  0 ❌ ±0 

Results for commit 1bd7797. ± Comparison against earlier commit ad2e63d.

The manual Accept All / Cancel bar briefly flashed below an assistant
tool call before command-service started auto-executing it. The drain
loop decides "auto-execute" inside an async 100ms-debounced task, but
the room's readyCommands getter only filtered by execution status —
so during the debounce window the bar painted, then yanked itself
when acceptingAllRoomIds flipped.

Pull the auto-execute decision into a synchronous predicate
(isAutoExecutableCommand) and call it from both command-service
(to decide whether to run) and room.gts readyCommands (to decide
whether to render the bar). Single source of truth for the three
conditions — checkCorrectness, requiresApproval=false, LLM mode act.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FadhlanR FadhlanR force-pushed the cs-11647-accept-all-flash-before-check-correctness branch from 19f74e0 to 07dc010 Compare June 18, 2026 13:52
FadhlanR and others added 3 commits June 18, 2026 22:28
isAutoExecutableCommand has three branches (checkCorrectness,
requiresApproval=false, LLM mode 'act') and the readyCommands filter
treats them uniformly. Rename the existing integration test so it
reads as "any always-auto-executed command" instead of a
checkCorrectness-specific case, and add a second test that drives the
requiresApproval=false branch via the boxel-environment skill.

The LLM-mode='act' branch is still locked in by the unit test —
exercising it end-to-end would need extra LLM-mode-state plumbing for
diminishing returns since all three branches converge on the same
filter line in readyCommands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-command Apply button (rendered next to each tool-call message)
flashes through its 'ready' state — a clickable Run button — for the
~100ms debounce window before command-service starts the auto-execute
run. Same root cause as the Accept All bar.

Reuse isAutoExecutableCommand: when status is ready/undefined and the
helper says this command will auto-execute, render the applying state
(spinner) immediately. The button then transitions naturally to
applied on completion. If validate fails in the drain, command-service
emits an invalid commandResult and the button settles into its
invalid state — no risk of the spinner sticking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI surfaced a regression on the per-command Apply button fix:
  Acceptance | Commands tests: ShowCard command added from a skill,
  is not automatically executed when agentId does not match

A ShowCard command sent by another agent has requiresApproval=false
on the skill, so isAutoExecutableCommand returned true and the button
masked into the applying spinner — but command-service refuses to run
it because of the agentId gate (drainCommandProcessingQueue line
354-357). The button would have stuck on the spinner forever.

Thread isOwnedByCurrentAgent through the predicate and short-circuit
to false when it's false. Callers compute it from
`message.agentId === matrixService.agentId`. command-service passes
`true` because its outer loop already short-circuits on the same
condition before reaching the predicate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FadhlanR FadhlanR changed the title fix(CS-11647): hide Accept All bar for auto-executed commands fix(CS-11647): hide manual approval UI for auto-executed commands Jun 19, 2026
FadhlanR and others added 4 commits June 19, 2026 11:06
The new isAutoExecutableCommand agent-ownership check fails closed
when message.agentId doesn't equal matrixService.agentId. The new
CS-11647 integration tests went through simulateRemoteMessage without
setting `data.context.agentId`, so the helper short-circuited to
false in the test environment and the Accept All bar / per-command
ready button rendered for an unrelated reason. CI surfaced this on
the two tests that depended on the helper actually returning true:

  not ok 121 ... Accept All bar does not flash for a requiresApproval=false command
  not ok 122 ... per-command Apply button does not flash Run before auto-execute starts

Pass the current agent's id explicitly. The checkCorrectness Accept
All test also gets the context (previously passing by accident
because validate marks status=invalid for an ad-hoc checkCorrectness
call, which the readyCommands filter already excludes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng is stuck

drainCommandProcessingQueue used to `continue` past a wedged room without
emitting any commandResult, so the synthetic "applying" spinner in
room-message-command.gts had no terminal event to fall through and hung
forever. Dispatch an `invalid` commandResult for each auto-executable
command on that path so the UI surfaces the invalidCommandState alert
with a Try Anyway button.

Also extract STUCK_PROCESSING_TIMEOUT_MS so tests can target the same
threshold the prod path uses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ing state

While the synthetic "applying" spinner is up (auto-executable command
between landing and drainCommandProcessingQueue starting the run task),
the card's idle test attribute used to disagree with the apply button:
the apply button reported applying, the card still reported idle. Drive
both off applyButtonState so the two attributes always agree about
whether the spinner is visible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd drop ticket prefix

- New unit-flavoured integration test exercising
  invalidateAutoExecutableCommandsForStuckProcessing directly: spies on
  matrixService.sendCommandResultEvent and asserts only the auto-
  executable command gets an invalid dispatch with the expected
  failureReason. Avoids stubbing the ember-resources proxy, which
  silently no-ops Object.defineProperty.
- Add a coherence assertion that data-test-command-card-idle is omitted
  while the synthetic applying state is on (Glimmer drops attributes
  bound to falsy expressions, so the test selector is presence-based).
- Drop the per-test CS-11647 prefix from titles now that the cluster is
  large enough to read on its own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FadhlanR FadhlanR marked this pull request as ready for review June 19, 2026 09:27
@FadhlanR FadhlanR requested a review from a team June 19, 2026 09:27
@habdelra habdelra requested a review from Copilot June 19, 2026 12:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a UI flash where manual approval controls briefly appear for commands that the host will auto-execute, by centralizing the auto-execute decision into a shared predicate and using it consistently across the command drain loop and both UI surfaces.

Changes:

  • Introduces isAutoExecutableCommand() (and re-homes CHECK_CORRECTNESS_COMMAND_NAME) as a single source of truth for auto-execution eligibility.
  • Updates command draining + room UI to hide/mask manual approval UI immediately for auto-executable commands.
  • Adds unit + integration tests covering the three auto-exec branches plus the agent-ownership gate.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/host/app/lib/command-auto-execute.ts Adds shared auto-execution predicate and constant to prevent UI/logic drift.
packages/host/app/services/command-service.ts Uses shared predicate for auto-execution; adds stuck-processing invalidation for auto-exec commands.
packages/host/app/components/matrix/room.gts Filters out auto-executable commands from readyCommands to prevent Accept All bar flashing.
packages/host/app/components/matrix/room-message-command.gts Masks per-command Apply button into immediate “applying” state for auto-executable commands.
packages/host/tests/unit/lib/command-auto-execute-test.ts Unit coverage for predicate branches + agent-ownership short-circuit.
packages/host/tests/integration/components/ai-assistant-panel/commands-test.gts Integration coverage ensuring approval UI doesn’t flash for auto-exec paths and still renders for manual approval.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +483 to +497
let invokedToolFromEventId =
this.getCurrentEventIdForCommandRequest(
roomId,
messageCommand.commandRequest.id,
) ?? messageCommand.eventId;
await this.matrixService.sendCommandResultEvent({
roomId,
invokedToolFromEventId,
toolCallId: messageCommand.commandRequest.id!,
status: 'invalid',
failureReason: `Room processing did not finish within ${Math.round(
STUCK_PROCESSING_TIMEOUT_MS / 1000,
)}s; command was not started`,
context: await this.operatorModeStateService.getSummaryForAIBot(),
});
Comment on lines +58 to +62
// How long drainCommandProcessingQueue waits for a room resource that's
// still processing before giving up on the event. In tests we shorten this
// so the stuck-timeout invalidation path can be exercised in a single test
// without holding a real test open for a minute.
const STUCK_PROCESSING_TIMEOUT_MS = isTesting() ? 1000 : 60_000;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants